In this example we will examine multivariate forecasting models using mvgam, which fits GAMs using MCMC sampling via the JAGS software (Note that JAGS 4.3.0 is required; installation links are found here). First a simulation experiment to determine whether mvgam's inclusion of complexity penalisation works by reducing the number of un-needed dynamic factors. In any factor model, choosing the appropriate number of factors K can be difficult. The approach used by mvgam is to estimate a penalty for each factor that squeezes the factor's variance toward zero, effectively forcing the factor to evolve as a flat white noise process. By allowing each factor's penalty to be estimated in an exponentially increasing manner (following Welty, Leah J., et al. Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality Biometrics 65.1 (2009): 282-291), we hope that we can guard against specying too large a K. But we should test this property to see how it behaves in different scenarios. Begin by simulating 8 series that evolve with a shared seasonal pattern and that depend on 2 latent random walk factors. Each series is 100 time steps long, with a seasonal frequency of 12. We give the trend moderate importance by setting trend_rel = 0.6 and we allow each series' observation process to be drawn from slightly different Negative Binomial distributions
set.seed(100)
library(mvgam)
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
## Loading required package: rjags
## Loading required package: coda
## Linked to JAGS 4.3.0
## Loaded modules: basemod,bugs
## Loading required package: parallel
## Loading required package: runjags
## Warning: package 'runjags' was built under R version 4.0.5
dat <- sim_mvgam(T = 100, n_series = 8, n_trends = 2,
mu_obs = runif(8, 2, 6), size_obs = runif(8,
0.5, 2), trend_rel = 0.6, train_prop = 0.8)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
Have a look at the series
par(mfrow = c(4, 2))
for (i in 1:8) {
plot(dat$data_train$y[which(as.numeric(dat$data_train$series) ==
i)], type = "l", ylab = paste("Series",
i), xlab = "Time")
}
par(mfrow = c(1, 1))
Clearly there are some correlations in the trends for these series. But how does a dynamic factor process allow us to potentially capture these dependencies?